Introduction
This paper proposes a method for refining vector space representations using relational information from semantic lexicons by encouraging linked words to have similar vector representations.
It makes no assumptions about how the input vectors were constructed.
The contribution of this paper is a graph-based learning technique for using lexical relational resources to obtain higher quality semantic vectors, which called retrofitting
.
It is a post-processing step by running belief propagation on a graph constructed from lexicon-derived relational information to update word vectors.
The new vectors to be:
- similar to the vectors of related word types
- similar to their purely distributional representation
They show that retrofitting gives consistent improvement in performance on evaluation benchmarks with different word vectors lengths.
Retrofitting with Semantic Lexicons
$\hat{Q}$ is the original word vectors. The object is to learn the matrix $Q=(q_1,...,q_n)$ such that the columns are both close (under a distance metric) to their counterparts in $\hat{Q}$ and to adjacent vertices in $\omega$.$\Psi(Q)=\sum_{i=1}^{n}[\alpha_i||q_i-\hat{q_i}||^2 + \sum_{(i,j)\in E} \beta_{ij} ||q_i-q_j||^2]$
Take the first derivative of $\Psi$ with respect to one $q_i$ vector, and by equating it to zero arrive at the following online update:
$q_i=\frac{\sum_{j:(i,j) \in E} \beta_{ij}q_j+\alpha_i \hat{q_i}}{\sum_{j:(i,j) \in E} \beta_{ij}+\alpha_i}$
Semantic Lexicons during Learning
The approach is applied as a second stage of learning. In the prior approach, semantic lexicons play the role of a prior on $Q$ which is defined as follows:
$p(Q) \propto exp(-\gamma \sum_{i=1}^n \sum_{j:(i,j)\in E} \beta_{ij}||q_i-q_j||^2)$
Evaluation Benchmarks
Word Similarity
- WS-353
- RG-65
- MEN
They calculate cosine similarity between the vectors of two words forming a test item, and reportSpearman's rank correlation coefficient
between the rankings produced by our model against the human rankings
Syntactic Relations (SYN-REL)
The task is to find a word $d$ that best fits the following relationship: “$a$ is to $b$ as $c$ is to $d$”, given $a$, $b$, and $c$.
Synonym Selection (TOEFL)
Sentiment Analysis (SA)
Experiments
using retrofitting method, PPDB is best
compared LBL
, LBL+lazy
, LBL+periodic
and LBL+retrofitting
, LBL+lazy
and LBL+periodic
are incorporating lexicon prior during training.
lazy
方法指的是一次性算出k个词的log-likelihood和log-prior的梯度的和。
periodic
的方法则是每隔一段时间,利用公式1来更新所有的词向量
LBL
的目标公式和word2vec
的差别在于后者使用了softmax
。
analysis
在学习的过程中修改的话,periodic
方法更好。而本文提出的方法则是表现要好于lazy
和periodic
的。